Chemical Compound Classification with Automatically Mined Structure Patterns

نویسندگان

  • Aaron M. Smalter
  • Jun Huan
  • Gerald H. Lushington
چکیده

In this paper we propose new methods of chemical structure classification based on the integration of graph database mining from data mining and graph kernel functions from machine learning. In our method, we first identify a set of general graph patterns in chemical structure data. These patterns are then used to augment a graph kernel function that calculates the pairwise similarity between molecules. The obtained similarity matrix is used as input to classify chemical compounds via a kernel machines such as the support vector machine (SVM). Our results indicate that the use of a pattern-based approach to graph similarity yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art approaches. In addition, the identification of highly discriminative patterns for activity classification provides evidence that our methods can make generalizations about a compound's function given its chemical structure. While we evaluated our methods on molecular structures, these methods are designed to operate on general graph data and hence could easily be applied to other domains in bioinformatics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Google N-grams and Gigaword Dependencies as Automatically Mined Features in Temporal Relation Extraction

This paper describes a system for learning before-after temporal relations by training on text and dependency relation patterns mined from large-scale corpora. We mined the Google N-gram corpus to extract good text patterns to use as features in a temporal ordering classification system. Though we identified these as being qualitatively good features. Classification results were just above base...

متن کامل

Other-Anaphora Resolution in Biomedical Texts with Automatically Mined Patterns

This paper proposes an other-anaphora resolution approach in bio-medical texts. It utilizes automatically mined patterns to discover the semantic relation between an anaphor and a candidate antecedent. The knowledge from lexical patterns is incorporated in a machine learning framework to perform anaphora resolution. The experiments show that machine learning approach combined with the auto-mine...

متن کامل

Automatically Mining Question Reformulation Patterns from Search Log Data

Natural language questions have become popular in web search. However, various questions can be formulated to convey the same information need, which poses a great challenge to search systems. In this paper, we automatically mined 5w1h question reformulation patterns from large scale search log data. The question reformulations generated from these patterns are further incorporated into the ret...

متن کامل

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...

متن کامل

Mining the Strongest Patterns in Medical Sequential Data

Sequential data represent an important source of automatically mined and potentially new medical knowledge. They can originate in various ways. Within the presented domain they come from a longitudinal preventive study of atherosclerosis – the data consist of series of long-term observations recording the development of risk factors and associated conditions. The intention is to identify freque...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings of the ... Asia-Pacific bioinformatics conference

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2008